This is an exploration into the following question:
How do public high schools police bodies differently?
As such, I have been surveying public high schools with available dress codes for the 2018-2019 school year (and no uniform policy) across the US. I found a list of potential schools using the National Center for Education Statistic’s search function for public schools. I limited the resulting schools to just those that had a web address listed. I then scraped the homepage content for each website and searched for words like handbook, dress code and code of conduct. I further filtered my list of schools to just those that contained one of the above phrases. Then I manually visited each of the resulting (2000+) websites to find the actual dress code and to verify that there was no uniform policy and that the dress code was from the 2018 - 2019 school year. I ended up with a list of 831 schools considered to be “regular”, “non-magnet”, “non-boarding” high schools that met my qualifications. I’ve been manually collecting information from them in this Google Form.
Now to process that data.
To start, I’ll only need a few packages, mostly to connect me with the Google Sheets where the data from my form is being collected.
library(tidyverse)
library(here)
library(maps)
I’ll download the data from Google Drive and set the overwrite permissions to TRUE so that I can re-run this analysis easily as I update more data.
googledrive::drive_download("Dress Code Responses", path = here::here("raw_data", "collected", "responses.csv"), type = "csv", overwrite = TRUE)
responses <- read.csv(here::here("raw_data", "collected", "responses.csv"), stringsAsFactors = FALSE, header = TRUE, na.strings = c("", " "))
Because of the way that Google Sheets works, we end up with very wide data. That is, each clothing item is listed as a column instead of having several rows per school. Therefore, my data needs to go from wide to long.
In order to do that, I need to first split out information about specific item length and strap width since these data were entered in a slightly different way than the rest.
length <- responses %>%
select(c("School.Name", "School.State.Abbreviation"), contains("length.limit")) %>%
rename(limits = !!names(.[3]), length = !!names(.[4])) %>%
filter(limits != "") %>%
# Separate comma delimited list of items
separate_rows(limits, sep = ",") %>%
mutate(limits = trimws(limits))
How many schools regulate the length of clothing items?
schools <- length %>%
distinct(School.Name)
# Return count
nrow(schools)
## [1] 252
# Return percentage
nrow(schools) / nrow(responses)
## [1] 0.7455621
Alright, so about 74% of the schools that I’ve surveyed so far have some length limitations.
What are those length limitations?
lengthLimits <- length %>%
separate_rows(length, sep = ",") %>%
filter(!grepl("\\(?[0-9,.]+\\)?", length)) %>%
count(length, sort = TRUE) %>%
filter(n >= 10)
lengthLimits
## # A tibble: 8 x 2
## length n
## <chr> <int>
## 1 shorter than fingertips 180
## 2 "\"short\"/\"inappropriate length\" (no exact measurement given)" 121
## 3 shorter than mid-thigh 116
## 4 shorter than x inches from knee (insert amount in other) 108
## 5 shorter than the knee 36
## 6 " shorter than x inch inseam (insert amount in other)" 22
## 7 " shorter than x inches from knee (insert amount in other)" 11
## 8 " \"short\"/\"inappropriate length\" (no exact measurement given)" 10
What about by item type?
lengthByItem <- length %>%
count(limits, sort = TRUE) %>%
mutate(type = "length",
limits = paste0("short ", limits)) %>%
rename(item = limits)
lengthByItem
## # A tibble: 10 x 3
## item n type
## <chr> <int> <chr>
## 1 short shorts 241 length
## 2 short skirts 208 length
## 3 short dresses 122 length
## 4 short skorts 12 length
## 5 short pants 7 length
## 6 short jumpers 4 length
## 7 short capris 1 length
## 8 short clothing 1 length
## 9 short shirts 1 length
## 10 short tunics 1 length
straps <- responses %>%
select(c("School.Name", "School.State.Abbreviation"), contains("shirt.straps")) %>%
rename(limits = !!names(.[3])) %>%
filter(limits != "") %>%
# Separate comma delimited list of items
separate(limits, into = c("limits", "inches"), sep = ",") %>%
mutate(limits = trimws(limits))
## Warning: Expected 2 pieces. Additional pieces discarded in 1 rows [27].
## Warning: Expected 2 pieces. Missing pieces filled with `NA` in 19 rows [1,
## 11, 12, 18, 24, 31, 36, 38, 39, 45, 54, 55, 62, 72, 76, 77, 78, 79, 82].
How many about strap width?
schoolsStrap <- straps %>%
distinct(School.Name)
# Return count
nrow(schoolsStrap)
## [1] 90
# Return percentage
nrow(schoolsStrap) / nrow(responses)
## [1] 0.2662722
Only 28% have some limitations on strap width explicitly laid out in the handbook.
In order to figure out all of the banned items, we need to eliminate some columns of unneccessary information.
banned <- responses %>%
select(-contains("sanctions"), - contains("shirt.straps"), -contains("length.limit"))
elongateCode <- function(type){
if(type == "clothing" || type == "body"){
regex <- paste0("other.", type)
} else {
regex <- paste0("\\b", type, "\\b.*?\\gender\\b")
}
bonusColumn <- colnames(select(banned, matches(regex)))
new <- banned %>%
# First sort out any additional items that we added to our bonus column
separate_rows_(bonusColumn, sep = ",") %>%
separate(bonusColumn, into = c("item", "prohibited"), sep = ":") %>%
mutate(type = type) %>%
select(c(1:6), c(item, prohibited, type)) %>%
filter(!is.na(prohibited)) %>%
mutate(prohibited = ifelse(prohibited == "NA", "none", prohibited))
new2 <- banned %>%
select(-contains("any.other")) %>%
gather(key = item, value = prohibited, colnames(select(., contains(!!type)))) %>%
filter(!is.na(prohibited)) %>%
mutate(item = gsub("(.*\\.{3})", "", item),
item = gsub("\\.", " ", item),
item = trimws(item)) %>%
mutate(type = type) %>%
select(c(1:6), c(item:type))
combined <- rbind(new, new2)
}
clothingTypes <- c("accessories", "shirt", "skirt.dress", "pants", "shorts", "undergarment", "footwear", "headwear", "grooming", "body", "clothing")
longDressCode <- map_dfr(clothingTypes, elongateCode)
#write.csv(longDressCode, here("processed_data", "clean_dress_code.csv"), row.names = FALSE)
Some schools do explicitly ban things for either male or female students.
How many schools explicitly ban things for male vs. female students?
explicitSchools <- longDressCode %>%
filter(prohibited != "none") %>%
separate_rows(prohibited, sep = ",") %>%
mutate(prohibited = trimws(prohibited),
prohibited = case_when(
prohibited == "boys" ~ "male",
prohibited == "Na" ~ NA_character_,
prohibited == "NA" ~ NA_character_,
TRUE ~ prohibited
)) %>%
filter(prohibited == "male" | prohibited == "female") %>%
group_by(School.Name, prohibited) %>%
count()
nrow(explicitSchools) / nrow(responses)
## [1] 0.2633136
Alright so 26% of schools have banned at least one item explicitly for either male or female students.
explicitCount <- longDressCode %>%
filter(prohibited != "none") %>%
separate_rows(prohibited, sep = ",") %>%
mutate(prohibited = trimws(prohibited),
prohibited = case_when(
prohibited == "boys" ~ "male",
prohibited == "Na" ~ NA_character_,
prohibited == "NA" ~ NA_character_,
TRUE ~ prohibited
)) %>%
filter(prohibited == "male" | prohibited == "female") %>%
group_by(School.Name, prohibited) %>%
count() %>%
spread(prohibited, n) %>%
replace(is.na(.), 0) %>%
mutate(diff = (abs(male-female)) / ((male + female) / 2))
mean(explicitCount$diff)
## [1] 0.9213809
How many things are explicitly banned for male vs. female students?
explicitCount <- longDressCode %>%
filter(prohibited != "none") %>%
separate_rows(prohibited, sep = ",") %>%
mutate(prohibited = trimws(prohibited),
prohibited = case_when(
prohibited == "boys" ~ "male",
prohibited == "Na" ~ NA_character_,
prohibited == "NA" ~ NA_character_,
TRUE ~ prohibited
)) %>%
filter(prohibited == "male" | prohibited == "female") %>%
count(prohibited)
explicitCount
## # A tibble: 2 x 2
## prohibited n
## <chr> <int>
## 1 female 132
## 2 male 210
So it’s more common to explicitly ban things for male students.
Let’s take a look at what those things were.
explicit <- longDressCode %>%
filter(prohibited != "none") %>%
separate_rows(prohibited, sep = ",") %>%
mutate(prohibited = trimws(prohibited),
prohibited = case_when(
prohibited == "boys" ~ "male",
prohibited == "Na" ~ NA_character_,
prohibited == "NA" ~ NA_character_,
TRUE ~ prohibited
)) %>%
filter(prohibited == "male" | prohibited == "female") %>%
group_by(prohibited) %>%
count(item, sort = TRUE) %>%
group_by(prohibited) %>%
head(n = 10)
explicit
## # A tibble: 10 x 3
## # Groups: prohibited [2]
## prohibited item n
## <chr> <chr> <int>
## 1 male earrings 18
## 2 male sleeveless 16
## 3 female midsection midriff 11
## 4 female piercings other than ear lobes 10
## 5 female spaghetti string straps 10
## 6 male piercings other than ear lobes 8
## 7 female low necklines 7
## 8 male midsection midriff 7
## 9 male muscle shirts open sides 7
## 10 male hairstyles that obstruct vision 6
ggplot(explicit, aes(x = item, y = n, fill = prohibited)) + geom_bar(stat = "identity", position = "dodge") +
theme(axis.text.x = element_text(angle = 90, hjust = 1))
maleHair <- longDressCode %>%
filter(prohibited != "none") %>%
separate_rows(prohibited, sep = ",") %>%
mutate(prohibited = trimws(prohibited),
prohibited = case_when(
prohibited == "boys" ~ "male",
prohibited == "Na" ~ NA_character_,
prohibited == "NA" ~ NA_character_,
TRUE ~ prohibited
)) %>%
filter(prohibited == "male" | prohibited == "female") %>%
mutate(hair = grepl("collar|lobe", item))
Alright, so there aren’t too many items that are explicitly gendered. Let’s look at what types of items are banned overall and then I’ll get into things that are implicitly targetting students of a particular gender.
Which body parts are specifically prohibited the most on high school campuses?
What types of clothing have been specifically prohibited?
Sometimes descriptive words are used instead of explicit clothing items (showing items prohibited by a minimum of 3 schools).
Typically before listing the actual items that are prohibited, schools give a brief rationale for why their dress codes exist. What words show up the most there?
I need to create a list of items that have been prohibited so I can get a better sense of how many are targetted towards specific types of students.
bannedItems <- longDressCode %>%
group_by(type, item) %>%
count(item, sort = TRUE)
bannedwithLength <- bannedItems %>%
ungroup() %>%
bind_rows(lengthByItem) %>%
arrange(desc(n))
write.csv(bannedItems, here::here("processed_data", "bannedItems.csv"), row.names = FALSE)
I’ll now upload this to Google Drive to make some manual annotations.
Let’s take a look again at the banned items highlighting the gender (if any) that these items implicitly target.
Which body parts are specifically prohibited the most on high school campuses?
What types of clothing have been specifically prohibited?
Sometimes descriptive words are used instead of explicit clothing items (showing items prohibited by a minimum of 3 schools).
If we were to look at the breakdown between the banned items that are implicitly targetted at female students vs. male students: (100% = 100% female targeted, 0% = 100% male targeted). This calculation does not include the banned items that are not implicitly targeted at either male or female students.
If we do look at the percentage of female gender-targetted items compared to all items, here is the distribution:
## Warning: Removed 1 rows containing non-finite values (stat_bin).
And male gender-targeted:
If we do look at the percentage of female gender-targetted items compared to all items, here is the distribution:
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 1 rows containing non-finite values (stat_bin).
Similarly, this is the percentage of items in the dresscode implicitly target students of color (with 100% = 100% poc targetted, and 0% = 0% poc targetted).
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
histogramData <- bannedGenderAll %>%
select(c(School.Name, pFem, pMal)) %>%
left_join(bannedRace) %>%
select(-c(n, y)) %>%
mutate(wCat = 0) %>%
mutate(femCat = case_when(
pFem <= 5 ~ 0,
between(pFem, 5, 10) ~ 1,
between(pFem, 10, 15) ~ 2,
between(pFem, 15, 20) ~ 3,
between(pFem, 20, 25) ~ 4,
between(pFem, 25, 30) ~ 5,
between(pFem, 30, 35) ~ 6,
between(pFem, 35, 40) ~ 7,
between(pFem, 40, 45) ~ 8,
between(pFem, 45, 50) ~ 9,
between(pFem, 50, 55) ~ 10,
between(pFem, 55, 60) ~ 11,
between(pFem, 60, 65) ~ 12,
between(pFem, 65, 70) ~ 13,
between(pFem, 70, 75) ~ 14,
between(pFem, 75, 80) ~ 15,
between(pFem, 80, 85) ~ 16,
between(pFem, 85, 90) ~ 17,
between(pFem, 90, 95) ~ 18,
between(pFem, 95, 100) ~ 19,
TRUE ~ pFem
),
malCat = case_when(
pMal <= 5 ~ 0,
between(pMal, 5, 10) ~ 1,
between(pMal, 10, 15) ~ 2,
between(pMal, 15, 20) ~ 3,
between(pMal, 20, 25) ~ 4,
between(pMal, 25, 30) ~ 5,
between(pMal, 30, 35) ~ 6,
between(pMal, 35, 40) ~ 7,
between(pMal, 40, 45) ~ 8,
between(pMal, 45, 50) ~ 9,
between(pMal, 50, 55) ~ 10,
between(pMal, 55, 60) ~ 11,
between(pMal, 60, 65) ~ 12,
between(pMal, 65, 70) ~ 13,
between(pMal, 70, 75) ~ 14,
between(pMal, 75, 80) ~ 15,
between(pMal, 80, 85) ~ 16,
between(pMal, 85, 90) ~ 17,
between(pMal, 90, 95) ~ 18,
between(pMal, 95, 100) ~ 19,
TRUE ~ pMal
),
raceCat = case_when(
pRac <= 5 ~ 0,
between(pRac, 5, 10) ~ 1,
between(pRac, 10, 15) ~ 2,
between(pRac, 15, 20) ~ 3,
between(pRac, 20, 25) ~ 4,
between(pRac, 25, 30) ~ 5,
between(pRac, 30, 35) ~ 6,
between(pRac, 35, 40) ~ 7,
between(pRac, 40, 45) ~ 8,
between(pRac, 45, 50) ~ 9,
between(pRac, 50, 55) ~ 10,
between(pRac, 55, 60) ~ 11,
between(pRac, 60, 65) ~ 12,
between(pRac, 65, 70) ~ 13,
between(pRac, 70, 75) ~ 14,
between(pRac, 75, 80) ~ 15,
between(pRac, 80, 85) ~ 16,
between(pRac, 85, 90) ~ 17,
between(pRac, 90, 95) ~ 18,
between(pRac, 95, 100) ~ 19,
TRUE ~ pRac
)) %>%
select(-c(pFem, pRac, pMal)) %>%
gather(type, group, -School.Name) %>%
rename(school = School.Name) %>%
mutate(type = case_when(
type == "femCat" ~ "f",
type == "malCat" ~ "m",
type == "raceCat" ~ "c",
type == "wCat" ~ "w",
TRUE ~ "other"
))
## Joining, by = "School.Name"
# Save into JS analysis folder
write.csv(histogramData, "../src/assets/data/histogramData.csv", row.names = FALSE, na = "")
intersections <- banned %>%
replace_na(list(race = "n", gender = "n")) %>%
group_by(gender, race) %>%
count()
intersections
## # A tibble: 6 x 3
## # Groups: gender, race [6]
## gender race nn
## <chr> <chr> <int>
## 1 f n 2128
## 2 f y 48
## 3 m n 504
## 4 m y 980
## 5 n n 2829
## 6 n y 63
Of the 300 schools I’ve collected data for so far, what is their state distribution?
byState <- responses %>% count(School.State.Abbreviation, sort = TRUE) %>%
mutate(state = state.name[match(School.State.Abbreviation, state.abb)]) %>%
mutate(state = tolower(state))
ggplot(byState, aes(x = reorder(School.State.Abbreviation, n), y = n)) + geom_bar(stat = "identity")
Let’s try to look at this on a map:
us <- map_data("state")
ggplot() + geom_map(data = us, map = us, aes(x = long, y = lat, map_id = region),
fill = "#ffffff", color="#ffffff", size = 0.15) +
geom_map(data = byState, map = us, aes(fill = n, map_id = state)) +
scale_fill_continuous(low='thistle2', high='darkred',
guide='colorbar')
## Warning: Ignoring unknown aesthetics: x, y
Alright, so Texas and NY are pretty high here and we’re missing some decent chunks of the country but overall, not terrible.
Let’s combine our demographic data with our dress code data
shuffledHandbooks <- read.csv(here::here("processed_data", "shuffledHandbooks.csv"), stringsAsFactors = FALSE, header = TRUE) %>%
mutate(schoolName = trimws(schoolName),
stateAbb = trimws(stateAbb))
demo <- banned %>%
mutate(schoolName = trimws(School.Name),
stateAbb = trimws(School.State.Abbreviation)) %>%
left_join(shuffledHandbooks)
## Joining, by = c("schoolName", "stateAbb")
Are schools with a lower percentage of white students more likely to implicitly target students of color?
raceDemo <- demo %>%
left_join(bannedRace, by = "School.Name")
ggplot(raceDemo, aes(x = pWhite, y = pRac)) + geom_point() + xlab("percentage of white students") + ylab("percentage of implicitly racial prohibited items")
## Warning: Removed 7 rows containing missing values (geom_point).
This appears to be either flat, or slightly the opposite of what I expected. Let’s look for a quick correlation.
cor(raceDemo$pRac, raceDemo$pWhite)
## [1] NA
That is pretty low, looks like there isn’t much of a correlation between the population of white students and items specifically targetted at students of color. However, that may change if we look specifically at the population of black students (since many of these banned items are specifically targeted at black students).
ggplot(raceDemo, aes(x = pBlack, y = pRac)) + geom_point()+ xlab("percentage of black students") + ylab("percentage of implicitly racial prohibited items")
## Warning: Removed 7 rows containing missing values (geom_point).
Again, really flat.
cor(raceDemo$pBlack, raceDemo$pRac)
## [1] NA
Slightly higher correlation than with white students, but still nothing explanatory. Maybe take a quick look at the state distribution.
ggplot(raceDemo, aes(reorder(School.State.Abbreviation, pRac), pRac)) + geom_point() + xlab("state") + ylab("percentage of implicitly racial prohibited items")
There’s a pretty wide spread within and between schools in a single state. Maybe there’s a difference in gender-targetted policies within/between states.
genDemo <- demo %>%
left_join(bannedGender, by = "School.Name")
ggplot(genDemo, aes(reorder(School.State.Abbreviation, pFem), pFem)) + geom_point() + xlab("state") + ylab("percentage of implicitly gender prohibited items")
## Warning: Removed 19 rows containing missing values (geom_point).
Not so much. Perhaps gender differences by the school locale?
ggplot(genDemo, aes(reorder(locale, pFem), pFem)) + geom_point() +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) + xlab("school locale") + ylab("percentage of implicitly gender prohibited items")
## Warning: Removed 19 rows containing missing values (geom_point).
Not so much. Perhaps race differences by the school locale?
ggplot(raceDemo, aes(reorder(locale, pRac), pRac)) + geom_point() +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) + xlab("school locale") + ylab("percentage of implicitly racial prohibited items")
Seemingly a few more outliers here, but no major trends.